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© Method and apparatus for formatting documents. 



method and an apparatus for formatting a 
document containing command codes indicating pre- 
^| scribed commands concerning structural function, 
^ which are simply manipulatable as well as consis- 
OQtently correct The method includes the steps of: 
to structurally analyzing the document and deriving 
© structural information of the document; detecting the 
^command codes and adjusting the analysis in accor- 
00 dance with the indications of the command codes; 
Wand formatting the document in accordance with the 
©structural information. An apparatus for carrying out 
^ the method is also disclosed. 
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METHOD AND APPARATUS FOR FORMATTING DOCUMENT 



BACKGROUND OF THE INVENTION 



Field of the Invention 

The present invention relates to a method and 
an apparatus for formatting a document in accor- 
dance with structures of the document. 



Description of the Background Art 

Recently, there has been remarkable pro- 
gresses in so-cailed desk-top publishing as well as 
in Japanese word-processors, in addition to the 
advances in the output devices such as displays 
and printers which made it possible to produce 
documents in much more diverse and effective 
manners. 

However, producing documents which are both 
neat- looking and easily-readable requires thorough 
understandings of operations, commands and for- 
mats of a document formatting apparatus to be 
used, which makes such a document formatting 
apparatus almost inaccessible to those without for- 
mal trainings. 

As a solution to this situation, progresses has 
been made in developing an automatic document 
formatting system which utilizes an automatically 
extracted logical structure of a document, along 
with a document formatting system capable of ar- 
ranging figures and articles according to automati- 
cally derived referential relationships between fig- 
ures and articles. 

Although such logical structures and referential 
relationships are generally sufficient to pro- 
videstructural information on the document neces- 
sary for effective formatting, ambiguities involved in 
human languages may lead to misapprehension of 
the logical structures and referential relationships. 
Furthermore, apart from this problem, outputs of % 
these automatic document formatting apparatus 
may not satisfy personal or temporal demands of a 
user. 

On the other hand, there are document format- 
ting system which utilizes command codes indicat- 
ing manners of formatting to be embedded in the 
document, such as "RofT. More recently, there ap- 
peared document formatting systems such as 
'Scribe' or T e X* which has document data and 
format data independently so that change in com- 
mand codes can be made at once on the format 
data, without looking for every embedded com- 
mand codes as in 'Roff . Moreover, 'Scribe' and 
T € X' are capable of performing more sophisticated 



formatting than others. But, in these document for- 
matting systems, a thorough understanding of com- 
mand codes is indispensable for skillful maneuver- 
ing. Furthermore, even in 'Scribe' and T f X\ em- 

s bedding of command codes is necessary which 
can easily be tedious. Though automatic document 
formatting apparatuses mentioned above are free 
of such problems concerning command codes, 
they are, as described above, prone to the mis- 

10 apprehension of the logical structures and referen- 
tial relationships due to the ambiguities in the hu- 
man languages, and the outputs of these automatic 
document formatting apparatus may not satisfy 
personal or temporal demands of a user. 

75 Thus, with conventional document formatting 

apparatuses, either a possibility of misapprehen- 
sion resulting from automatic extraction of docu- 
ment structures or else difficulties in dealing with 
command codes which need to be embedded in 

20 the document and thoroughly mastered by the 
user, has to be tolerated. 



25 



SUMMARY OF THE INVENTION 



It is therefore an object of the present invention 
to provide a method an an apparatus for formatting 
a document which are simply manipulatable as well 
as consistently correct. 

30 According to one aspect of the present inven- 

tion, there is provided an apparatus for formatting a 
document which contains command codes indicat- 
ing prescribed commands concerning structural 
function, comprising: means for enterring the docu- 

35 ment to the apparatus; means for structurally ana- 
lyzing the document and deriving structural in- 
formation of the document; means for detecting the 
command codes and adjusting the analysis by the 
structurally analyzing means in accordance with the 

40 indications of the command codes; and means for 
carrying out formatting of the document in accor- 
dance with the structural information. 

According to another aspect of the invention, 
there is provided a method of formatting a docu- 

45 ment which contains command codes indicating 
prescribed commands concerning structural func- 
tion, comprising the steps of; structurally analyzing 
the document and deriving structural information of 
the document; detecting the command codes and 

so adjusting the analysis at the structurally analyzing 
step in accordance with the indications of the com- 
mand codes; and formatting the document in ac- 
cordance with the structural information. 

Other features and advantages of the present 
invention will become apparent from the following 
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description taken in conjunction with the accom- 
panying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of a document 
formatting apparatus according to one embodiment 
of the present invention. 

Fig. 2 is a tabulated illustration of examples 
of command codes to be utilized in the document 
formatting apparatus shown in Fig. 1. 

Fig. 3 is another tabulated illustration of logi- 
cal structures to be utilized in the document for- 
matting apparatus shown in Fig. 1 . 

Fig. 4 is a flow chart for the operation of 
formatting by the document formatting apparatus 
shown in Fig. 1 . 

Fig. 5 is an illustration of a document being 
formatted by the document formatting apparatus 
shown in Fig. 1, showing manners in which the 
command codes are used in this embodiment. 

Fig. 6(A) and (B) are tabulated illustrations of 
logical structures for the document shown in Fig. 5, 
obtained without and with the command codes. 

Fig. 7 is another illustration of a document 
being formatted by the document formatting ap- 
paratus shown in Ftg. 1 , showing manners in which 
the command codes are used in this embodiment. 



DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Referring now to Fig. 1, there is shown one 
embodiment of a document formatting apparatus 
according to the present invention. 

This document formatting apparatus comprises 
an input unit 10 from which a document to be 
formatted with command codes embedded is en- 
tered, an original document memory 20 for storing 
the document with the command codes entered at 
the input unit 10, a format memory 30 for storing a 
format into which the document is to be formatted, 
an analyzing unit 40 comprising a command code 
analyzing unit 50 for analyzing contents of the 
command codes, and a document structure analyz- 
ing unit 60 for analyzing logical and referential 
structures of the document in accordance with the 
analyzed contents of the command codes, a docu- 
ment structure memory 70 for storing the analyzed 
structures of the document, a formatting processing 
unit 80 for carrying out formatting in accordance 
with the structures of the document stored in the 
document structure memory 70 and the format 
stored in the format memory 30. an output unit 90 
for presenting the document as formatted by the . 
formatting processing unit 80, and an administra- 



tion unit 99 for administering the operations by all 
these parts of this document formatting apparatus 
mentioned above. 

The input unit 10 may take a form of a key- 
5 board, a mouse, or a communication network. The 
output unit 90 may take a form of a CRT. a display, 
or a printer. 

In the analyzing unit 40, when the command 
codes are present in the document, the command 

10 code analyzing unit 50 adjusts operation of the 
document structure analyzing unit 60 such that the 
contents of the command codes are reflected in a 
manner of analyzing the logical and referential 
structures of the document. Thus, in this embodi- 

75 ment a user can deliberately control the analysis of 
the logical and referential structures. Moreover, the 
command code analyzing unit 50 also deciphers 
those command codes which are directly con- 
cerned with layout of the document, so that the 

20 user can also has a control over the layout of the 
document. 

An example of a set of command codes to be 
utilized in this embodiment are shown in Fig. 2. As 
given in the section (0) of Fi g. 2, a ny command 

25 code begins with a symbol C< i n this em- 

bodim'ent, and what follows this symbol C c ~ J 
designates a type of command code. Various dif- 
ferent types of command code and their corre- 
sponding logical attributes are summarized in the 

30 section (1) of Fig. 2. For example, a command 
code GC — 3 DATE indicates that what follows is 
the date, a command code Cc ^ ) NODC in- 
dicates what follows is the document number, and 
so on. In addition, there are anlysis prohibition 

35 codes shown in the section (2) of Rg. 2 which 
indicate a portion of the document not to be struc- 
turally analyzed such as those containing math- 
ematical formulae and those requiring special type 
of formatting. Furthermore, there are graphic refer- 

40 ence codes as shown in the section (3) of Rg. 2 
which indicate the presence of reference from the 
document to graphics, as well as the location of 
data on the graphics being referred and the layout 
of the graphics, i.e., how the graphics are to be 

45 incorporated into the final output of the document. 
There are also a compulsory return code shown in 
the section (4) of Rg. 2 which indicates forcible 
changing to the next line, regardless of the logical 
structure. 

so An example of the logical structure in the docu- 

ment structure memory 70 is shown in Fig. 3. The 
logical structure in the document structure memory 
70 comprises a sentence number labelling each 
sentences of the document, an attribute for each 

55 sentence, a level given to each attribute, and a 
header pattern for relevant sentences. For example, 
- in Fin. 3 a sentence labelled by the sentence 
number 4 is given an attribute of Paragraph End in 
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response to the command code C<r. O HEAD 
present in that sentence, whose level is given as 3 
and which has no header pattern. Such a logical 
structure will be utilized along with the format data 
in the format memory 30 in carrying out formatting 
of the document. 

Referring now to Fig. 4, the operation of docu- 
ment formatting by this document formatting ap- 
paratus will be described. 

At the step 100, the document with the com- 
mand codes is entered from the input unit 10 and 
stored in the original document memory 20. 

Then, one sentence of the document are read 
out from the original document memory 20 and 
given to the analyzing unit 40 at the step 101. 

Then at the step 102, whether there is any 
command code in this sentence is determined by 
the command code analyzing unit 50. 

When there is no command code in the sen^ 
tence, an ordinary automatic extraction of logical 
and referential structures of the document is car- 
ried out by the document structure analyzing unit 
60 at the step 103 and the process proceeds to the 
step 1 1 3 to be explained below. 

On the other hand, when there is a command 
code in the sentence, the step 104 will be taken in 
which whether the command code present in the 
sentence is one of the logical structure codes is 
determined by the command code analyzing unit 
50. 

When the command code is one of the logical 
structure codes, the logical structure of the sen- 
tence is extracted by the document structure ana- 
lyzing unit 60 at the step 105 in accordance with 
what the command code indicates and the process 
proceeds to the step 113. 

Otherwise, the step 106 will be taken in which 
whether the command code present in the sen- 
tence is one of the analysis prohibition codes is 
determined by the command code analyzing unit 
50. 

When the command code is one of the analy- 
sis prohibition codes, the structural analysis by the 
document structure analyzing unit 60 is controlled 
at the step 107 in accordance with what the com- 
mand code indicates and the process proceeds to 
the step 113. 

Otherwise, the step 108 will be taken in which 
whether the command code present in the sen- 
tence is one of the graphic reference codes is 
determined by the command code analyzing unit 
50. 

When the command code is one of the graphic 
reference codes, the referential structure of the 
sentence is extracted by the document structure 
analyzing unit 60 at the step 109 in accordance 
with what the command code indicates and the 
process proceeds to the step 113. 



Otherwise, the step 110 will be taken in which 
whether the command code present in the sen- 
tence is the compulsory return code is determined 
by the command code analyzing unit 50. 

5 When the command code is the compulsory 

return code, the information of the compulsory re- 
turn is deciphered and extracted by the command 
code analyzing unit 50 at the step 111 and the 
process proceeds to the step 113. 

10 Otherwise, the command code present in the 
sentence is in error since it is not any one of 
command codes given in Fig. 2, so at the step 1 1 2 
the error in the command code is corrected, and 
the process proceeds to the step 1 1 3. 

75 At the step 113, the result of the logical and 

referential structures obtained up to this point is 
stored in the document structure memory 70. 

Then at the step 1 14, whether ail the sentences 
in the document have been checked is determined. 

20 When all the sentences in the document have, not 
been checked, the process returns to the step 101 
and the steps following will be repeated. 

Otherwise the process proceeds to the step 
115, at which ( the formatting of the document is 

2$ carried out by the formatting processing unit 80 in 
accordance with the logical and referential struc- 
tures stored in the document structure memory 70 
as well as with the format stored in the format 
memory 30, and as the resulting formatted docu- 

30 ment is outputted by the output unit 90 the process 
terminates. 

One practical examples of the document and - 
the use of the command codes is shown in Fig. 5. 
In. this example of Fig. 5 which is a report 

35 entitled 'Intelligent document processing system', 
the first line "INTELLIGENT DOCUMENT PRO- 
CESSING SYSTEM" can be identified as a title by 
the ordinary automatic logical structure extraction 
so that no command code is necessary in this first 

40 line. On the other hand, the second line "RACHI 
YOZAN" will most likely not identifiable as a name 
of the author as Rachi Yozan is a very rare name 
so that this name cannot be found in a name data- 
base. Thus, that "RACHI YOZAN" is a name of the 

45 author is indicated by placing the command code 
C< - ^ J AUTH at the top of this second line. 
Accordingly, the document structure analyzing unit 
60 can construe "RACHI YOZAN" as a name of the 
author correctly, likewise, "NICHIBEI SOFTWARE" 

so in the third line can be identified correctly as a 
name of the organization to which the author be- 
longs be placing the command code Uc - J ) 
SECT at the top of this third line. As for the fourth 
line, there is a compulsory return code at the top of 

55 this fourth line, as this fourth line is to be left blank. 
The fifth tine* "T. iKTriODUCTSON" can be iden- 
tified as a section header with 'introduction' as a 
reserved word, by the ordinary automatic logical 
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structure extraction so that no command code is 
necessary in this fifth iine. Also, th e thirty-third line 
"~ E3ZF bunsho-1 




means that "Fig. 10 



is in the file 'bunsho-r and it is to be laid out in the 
lower part of the current page ... M . 

How the logical structure obtained by the docu- 
ment structure analyzing unit 60 is affected by the 
presence of the command codes in the example of 
Fig. 5 is shown in Fig. 6(A) and (B), where Fig. 6(A) 
shows the logical structure obtained from the docu- 
ment without the command codes and Fig. 6(B) 
shows the logical structure obtained from the docu- 
ment with the command codes. As can be seen 
from Fig. 6, without the command codes, i.e., by a 
completely automatic logical structure extraction, 
the second and the third lines are construed in- 
correctly as sub-titles in Fig. 6(A), whereas with the 
use of the command codes this misapprehension 
can be avoided in Fig. 6(B). 

Another practical example of the document and 
the use of the command codes, in particular the 
use of the analysis prohibition codes, is shown in 

Fig.1. . 

In this example of Fig. 7 which is a portion of 
an article containing mathematica l formula e, the 
analysis prohibition start line code C < ^ jXa nd 
the analysis prohibition end line code re J Y 
are placed at a top and a bottom, respectively, of 
the mathematical formulae so that this portion will 
not be structurally analyzed, as the meaning of the 
mathematical formulae is not anaiyzable by the 
document structure analyzing unit 60. Apart from 
the mathematical formulae, the analysis prohibition 
codes can similarly be used for those portions 
which are written in special or personal manners. In 
addition, the compulsory return code eg -J C 
can be placed in a middle of a line as shown in 
Fig. 7. 

As described according to this embodiment it 
is possible to have a document formatting appara- 
tus which is both simply manipulatable and consis- 
tently correct. It can be seen from the above de- 
scription that this is due to the particular use of 
both the automatic logical and referential struc- 
tures, and the command codes. Consequently, ac- 
cording to this embodiment the misapprehension 
inevitably accompanying the completely automatic 
structural analysis can be rectified by the use of 
the command codes. At the same time, as this 
embodiment requires the use of the command 
code In only those places which may causes mis- 
apprehension, and not elsewhere, the encum- 
brances associated with the conventional usage of 
the command codes can be lessened enormously. 

It is to be noted that although in the above 
embodiment, the command codes are placed ?t 
the top of lines, this can easily be modified to allow 
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placement of the command code anywhere in the 
lines. 

Also, although in the above embodiment the 
onlycommand code which are directly concerned 
with layout of the document is the compulsory 
return code, it is possible to incorporate more 
complicated command codes such as font com- 
mand codes and style command codes used in 
'Scribe', and thereby enhancing the scope of possi- 
ble formatting. 

Also, although in the above embodiment the 
structural analysis is performed line by line, this 
can be performed in so-called Top-Down fashion, 
i.e., the entire document all at once. 

Moreover, the analysis prohibition codes 
C^=ZD Xand 



I y can be modified such 



that within a portion indicated by them the analysis 
is to be performed by a particular processing sys- 
tem such as T € X which is known to be highly 
effective in dealing with mathematical formulae. 

Furthermore, this embodiment can effectively 
employed not only for the document containing 
references to the graphics as in the examples in 
the foregoing description, but also for the docu- 
ment having references to a list of references or 
biblidgrapby, and for relational data-bases. 

Besides these, many modifications and vari- 
ations of the above embodiment may be made 
without departing from the novel and advantageous 
features of the present invention. Accordingly, all 
such modifications and variations are intended to 
be included within the scope of the appended 
claims. 



Claims 



1. An apparatus for formatting a document 
which contains command codes indicating pre- 

40 scribed commands concerning structural function, 
comprising: 

means(IO) for enterring the document to the ap- 
paratus; 

means (60) for structurally analyzing the document 
45 and deriving structural information of the document; 
means (50) for detecting the command codes; and 
means (80) for carrying out formatting of the docu- 
ment in accordance with the structural information; 
characterized in that: 
so detecting means(SO) also adjusting the analysis by 
the structurally analyzing means(60) in accordance 
with the indications of the command codes. 

2. The apparatus of claim 1 , wherein the struc- 
turally analyzing means(60) carries out automatic 

55 analysis of the document and derivation of the 
structural information for portions of the document 
without the command codes. 
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3. The apparatus of claim 2, wherein the struc- 
turally analyzing means(60) analyzes logical struc- 
ture of the document. 

4. The apparatus of claim 2, wherein the struc- 
turally analyzing means(60) analyzes referential 5 
structure of the document. 

5. The apparatus of claim 1, wherein the com- 
mand codes also indicates prescribed command 
concerning format, and wherein the carrying out 
means(80) carries out the formatting in accordance w 
also with the commands concerning format Indi- 
cated by the command codes. 

6. A method of formatting a document which 
contains command codes indicating prescribed 
commands concerning structural function, compris- 75 
ing the steps of: 

structurally analyzing(103) the document and deriv- 
ing structural information of the document; 
detecting(l02) the command codes; and 
formatting(1l5) the document in accordance with 20 
the structural information; 
characterized by further comprising the step of, 
adjusting^ 05,107.109,111) the analysis at the 
structurally analyzing step in accordance with the 
indications of the command codes. 25 

7. The method of claim 6, wherein at the struc- 
turally analyzing step(103) automatic analysis of 
the document and derivation of the structural in- 
formation are carried out for portions of the docu- 
ment without the command codes. ao 

8. The method of claim 7, wherein at the struc- 
turally analyzing step(103) logical structure of the 
document is analyzed. 

9. The method of claim 7, wherein at the struc- 
turally analyzing step{103) referential structure of 3$ 
the document is analyzed. 

10. The method of claim 8, wherein the com- 
mand codes also indicates prescribed command 
concerning format, and wherein at the formatting 
step(1l5) the formatting is carried out in accor- 40 
dance also with the commands concerning format 
indicated by the command codes. 
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0) COMMAND CODE BEGINS WITH* 

1) LOGICAL STRUCTURE CODE 
COMMAND CODES 




LOGICAL ATTRIBUTES 
DATE 

DOCUMENT NUMBER 

TITLE 

AUTHOR 

MEMBERSHIP 

ADDRESS 

ABSTRACT 

HEADER 

PARAGRAPH 

BIBLIOGRAPHY 

APPENDIX 

FORWARD ADDRESS 

RETURN ADDRESS 




2) ANALYSIS PROHIBITION CODE 

>x ANALYSIS PROHIBITION START LINE CODE 

, Y ANALYSIS PROHIBITION END LINE CODE 

3) GRAPHIC REFERENCE CODE 

s REFERENCE START POINT CODE 

Z F REFERRED FILE CODE 

ZP REFERRED GRAPHIC LAYOUT CODE 

, E REFERENCE END POINT CODE 

4) COMPULSORY RETURN CODE 
c COMPULSORY RETURN CODE 
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INTELLIGENT DOCUMENT PROCESSING SYSTEM 




AUTH 
SECT 



RACHI YOZAN 
NICHIBEI SOFTWARE 



1 . INTRODUCTION 

Recently, appearances of the multi-functional 
Japanese word-processor and the desk-top 
publishing made it possible to deal with 



LOGICAL STRUCTURE EXTRACTION METHOD 
Our intelligent document processing: 
system uses the logical structure extraction 
method indicated by 
=3>S Fig. 10<^S>ZF bunsho-l<^2>ZP d<SES>E. • 
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